Translation Using JAPIO Patent Corpora: JAPIO at WAT2016
نویسندگان
چکیده
Japan Patent Information Organization (JAPIO) participates in scientific paper subtask (ASPEC-EJ/CJ) and patent subtask (JPC-EJ/CJ/KJ) with phrase-based SMT systems which are trained with its own patent corpora. Using larger corpora than those prepared by the workshop organizer, we achieved higher BLEU scores than most participants in EJ and CJ translations of patent subtask, but in crowdsourcing evaluation, our EJ translation, which is best in all automatic evaluations, received a very poor score. In scientific paper subtask, our translations are given lower scores than most translations that are produced by translation engines trained with the indomain corpora. But our scores are higher than those of general-purpose RBMTs and online services. Considering the result of crowdsourcing evaluation, it shows a possibility that CJ SMT system trained with a large patent corpus translates non-patent technical documents at a practical level.
منابع مشابه
Use of the Japio Technical Field Dictionaries and Commercial Rule-based Engine for NTCIR-PatentMT
Japio performs various patent-related translation businesses, and owns the original patent-document-derived bilingual technical term database (Japio Terminology Database) to be used by the translators. Currently the database contains more than 1,900,000 J-E bilingual technical terms. The Japio Technical Field Dictionaries (technical-field-oriented machine translation dictionaries) are created f...
متن کاملComparison of SMT and NMT trained with large Patent Corpora: Japio at WAT2017
Japan Patent Information Organization (Japio) participates in patent subtasks (JPC-EJ/JE/CJ/KJ) with phrase-based statistical machine translation (SMT) and neural machine translation (NMT) systems which are trained with its own patent corpora in addition to the subtask corpora provided by organizers of WAT2017. In EJ and CJ subtasks, SMT and NMT systems whose sizes of training corpora are about...
متن کاملUse of the Japio Technical Field Dictionaries for NTCIR-PatentMT
Japio performs various patent-related translation businesses, and owns the original patent-document-derived bilingual technical term database (Japio Terminology Database) to be used by the translators. Currently the database contains more than 1,000,000 J-E technical terms. The Japio Technical Field Dictionaries (technical-field-oriented machine translation dictionaries) are created from the Ja...
متن کاملUse of the Technical Field-Oriented User Dictionaries
Japio performs various patent-related translation businesses, and owns the original patent-documentderived bilingual technical term database (Japio Terminology Database) to be served for the translators. Currently the database contains more than 780,000 J-E technical terms. To adapt the database to the Patent Translation Task, Japio compiled machine translation dictionaries from it. 34 technica...
متن کاملNTT DTEC at Patent Retrieval Task
Search effectiveness is investigated when a corpus is created by using only “Title,” “Abstract,” and “Claims,” which are expected to briefly express the invention, instead of using the entire document in the search for documents similar to a patent application. In addition, the JAPIO patent abstract that expresses the invention is used to make a comparison with the search effectiveness of “Titl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016